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Abstract — Many  acoustic  factors  can  contribute  to  the 
classification  accuracy  of  ground  vehicles.  Classification  based 
on  a  single  feature  set  may  lose  some  useful  information.  To 
obtain  more  complete  knowledge  regarding  vehicles’  acoustic 
characteristics,  we  propose  a  fusion  approach  to  combine  two 
sets  of  features,  in  which  various  aspects  of  an  acoustic  signature 
are  emphasized  individually.  The  first  set  of  features  consists 
of  a  number  of  harmonic  components,  mainly  characterizing 
engine  noise.  The  second  set  of  features  is  a  group  of  key 
frequency  components,  designated  to  reflect  other  minor  but  also 
important  acoustic  factors,  such  as  tire  friction  noise.  To  find 
these  features,  we  apply  a  harmonic  extraction  and  a  mutual 
information  based  method  that  have  been  shown  effective  in  our 
previous  research.  Fusing  these  two  sets  of  features  provides 
a  more  complete  description  of  vehicles’  acoustic  signatures, 
and  reduces  the  limitation  of  relying  one  particular  feature  set. 
Further  to  a  feature  level  fusion  method,  we  propose  a  modified 
Bayesian  based  fusion  method  to  take  advantage  of  matching 
each  specific  feature  set  with  its  favored  classifier.  To  assess  the 
proposed  fusion  method,  experiments  are  carried  out  based  on  a 
multi-category  vehicles  acoustic  data  set.  Results  indicate  that 
the  fusion  approach  can  effectively  increase  the  classification 
accuracy  compared  to  those  using  each  individual  set  of  features. 
Bayesian  based  decision  level  fusion  is  found  to  be  significantly 
better  than  the  feature  level  fusion  approach. 

Keywords:  Acoustic  vehicle  classification,  information 
fusion,  feature  extraction,  mutual  information,  Bayesian 
decision  fusion. 

I.  Introduction 

Acoustic  sensors  can  collect  acoustic  signals  to  identify 
the  type  of  moving  ground  vehicles.  Acoustic  sensors  can  be 
used  in  sensor  networks  for  applications  such  as  battlefield 
monitoring  and  surveillance.  They  become  more  and  more 
attractive  because  of  their  rapid  deployability  and  low  cost  [1]- 
[4],  In  acoustic  sensor  processing,  classification  algorithms 
play  a  critical  role  to  identify  the  type  of  vehicle,  and  help 
to  improve  the  performance  of  tracking  [3],  [5], 

Many  acoustic  features  can  be  extracted  for  classification  of 
moving  vehicles.  The  commonly-used  features  are  the  levels 
of  various  harmonics  [6],  [7],  The  harmonics  features  have 
achieved  good  classification  performance,  with  a  stable  and 
compact  representation  [3],  [5].  Although  many  encouraging 
results  on  acoustic  vehicle  classification  have  been  shown  in 
previous  research  [1],  [3],  [5],  it  still  remains  a  challenging 
problem  due  to  the  complexity  of  vehicle  acoustic  signals,  the 
great  variation  of  ambient  interferences,  etc. 


In  particular,  most  classification  algorithms  that  have  been 
developed  for  acoustic  vehicle  classification  only  consider  one 
major  feature  set.  However,  the  overall  acoustic  signal  of  a 
running  vehicle  is  often  much  more  complicated;  the  vehicle’s 
sound  may  come  from  multiple  sources,  not  exclusively  by 
the  engine,  but  also  from  tires,  brakes,  etc.  Relying  on  one 
particular  feature  extraction  approach  is  therefore  likely  to  lose 
information.  This  could  become  even  worse  when  the  number 
of  model  parameters  is  further  restricted  by  other  factors,  such 
as  the  dimensionality  of  a  classifier’  input. 

In  this  paper,  we  focus  on  information  fusion  approaches  for 
acoustic  vehicle  classification.  We  argue  that  the  capability  gap 
between  different  feature  sets  can  provide  potential  to  improve 
the  classification  accuracy  by  information  fusion.  Moreover, 
information  fusion  may  alleviate  the  constraint  on  the  input’s 
dimensionality  for  certain  classifiers.  For  example  in  decision 
level  fusion,  several  classifiers  can  be  applied  to  each  feature 
set  individually,  and  the  overall  dimensionality  of  input  is 
broken  down  by  the  number  of  the  classifiers. 

In  the  proposed  fusion  approach,  a  group  of  new  features 
are  firstly  extracted  to  amend  the  existing  harmonic  features. 
The  added  features  are  called  key  frequency  components,  and 
are  selected  by  mutual  information  (MI),  a  metric  based  on 
the  statistical  dependence  between  two  random  variables  [8]. 
The  detailed  selection  algorithm  that  we  applied  here  is 
based  on  our  previous  research  [9],  [10].  Selection  of  the 
key  acoustic  features  by  mutual  information  can  help  to 
retain  those  frequency  components  that  contribute  most  to 
the  discriminatory  information,  meeting  our  goal  of  fusing 
information  for  classification. 

To  keep  the  same  dimensionality  as  the  original  feature 
space,  a  feature  level  fusion  process  is  first  designed  by 
replacing  the  higher  order  (or  other  less  important)  har¬ 
monic  components  with  the  same  number  of  key  frequency 
components.  For  the  purpose  of  fusion,  the  key  frequency 
components  are  deliberately  selected  to  be  unrelated  with 
the  fundamental  frequency.  This  scheme  adds  no  extra  cost 
in  the  classification  algorithm,  but  has  potential  to  increase 
discriminatory  capability.  Next,  an  improved  Bayesian  based 
decision  level  fusion  is  proposed  to  take  advantage  of  matching 
each  specific  feature  set  with  its  preferred  classifiers.  To  assess 
the  proposed  Mi-based  acoustic  feature  extraction  and  the 
subsequent  fusion  methods,  experiments  are  carried  out  based 


280 


Report  Documentation  Page 


Form  Approved 
OMB  No.  0704-0188 


Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 
VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 


1.  REPORT  DATE 

JUL  2008 

2.  REPORT  TYPE 

3.  DATES  COVERED 

00-00-2008  to  00-00-2008 

4.  TITLE  AND  SUBTITLE 

5a.  CONTRACT  NUMBER 

Acoustic  Information  Fusion  for  Ground  Vehicle  Classification 

5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

6.  AUTHOR(S) 

5d.  PROJECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

U.S.  Army  Research  Laboratory ,2800  Powdermill 

Road, Adelphi,MD, 20783-1197 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

10.  SPONSOR/MONITOR'S  ACRONYM(S) 

11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

11th  International  Conference  on  Information  Fusion,  June  30  ?  July  3,  2008,  Cologne,  Germany. 

14.  ABSTRACT 

see  report 

15.  SUBJECT  TERMS 

1 

1 

16.  SECURITY  CLASSIFICATION  OF: 


a.  REPORT 

unclassified 


b.  ABSTRACT 

unclassified 


c.  THIS  PAGE 

unclassified 


17.  LIMITATION  OF 
ABSTRACT 

Same  as 
Report  (SAR) 


18.  NUMBER 
OF  PAGES 

7 


19a.  NAME  OF 
RESPONSIBLE  PERSON 


Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


on  a  multi-category  vehicles  acoustic  data  set. 

The  rest  of  this  paper  is  organized  as  follows.  In  Section  II, 
we  argue  that  multiple  feature  sets  are  needed  to  improve 
the  vehicle  classification  accuracy.  Next  in  Section  III,  we 
discuss  how  to  use  the  mutual  information  to  extract  the  key 
frequency  components  to  obtain  the  necessary  new  informa¬ 
tion.  Subsequently,  to  combine  the  harmonics  features  and  the 
key  frequency  features,  we  design  a  feature  level  fusion  in 
Section  IV-A  and  propose  a  modified  decision  level  fusion  in 
Section  IV-B.  Experimental  results  are  presented  in  Section  V. 
Finally,  we  end  this  paper  with  conclusion  in  Section  VI. 

II.  Using  multiple  feature  sets  for  acoustic 

VEHICLE  CLASSIFICATION 

Differing  from  the  previous  research  [1],  [3]— [5],  we  first 
argue  that  multiple  feature  sets  should  be  considered  for  a 
more  effective  acoustic  vehicle  classification. 

It  is  known  that  the  acoustic  signature  of  a  running  vehicle 
is  made  up  of  a  number  of  individual  elements,  such  as  engine 
noise,  tire  friction  noise,  etc.  Many  classification  algorithms 
that  have  been  developed  in  acoustic  vehicle  classification 
were  based  on  the  harmonic  features,  and  good  performance 
has  been  achieved  [1],  [3]— [5],  However,  our  further  discus¬ 
sions  can  suggest  that  the  harmonics  features  may  be  incapable 
to  capture  the  whole  acoustic  signature.  For  example,  the 
tire  noise  is  generated  by  the  friction  between  the  tires  and 
road.  The  useful  information  embedded  in  this  noise  may  not 
necessarily  relate  to  the  fundamental  frequency  and  its  integral 
multipliers.  This  indicates  that  the  harmonics  may  be  unable  to 
capture  the  useful  distinguishing  information  in  this  particular 
element. 

Though  the  tire  friction  noise  seems  to  be  a  minor  con¬ 
stituent  of  the  whole  vehicle’s  sound,  it  could  contain  a 
valuable  acoustic  signature,  and  sometimes  could  be  important 
to  vehicle  classification.  For  example,  the  tire  friction  noise  can 
reflect  the  information  regarding  the  tires’  tread  pattern  (e.g., 
thread  shape,  rubber  blocks,  etc.).  These  factors  are  closely 
linked  with  the  type  of  vehicle,  and  should  not  be  omitted  for 
classification.  Therefore,  to  improve  the  accuracy  of  acoustic 
vehicle  classification,  we  propose  to  apply  information  fusion 
to  include  more  useful  acoustic  information. 

In  the  proposed  fusion  approach,  two  sets  of  features 
are  extracted  individually  to  capture  different  aspects  of  the 
acoustic  signature.  The  first  one  is  a  commonly-used  harmonic 
feature  vector  [3],  [5]— [7],  named  as  x/(,  which  is  used 
to  account  for  the  engine  noise.  The  second  one  is  a  key 
frequency  feature  vector,  named  as  x/,.,  which  is  aimed  at  other 
useful  information,  such  as  the  acoustic  signature  embedded 
in  the  tires’  friction  noise. 

Based  on  the  above  feature  extraction,  the  amended  acoustic 
signature  consists  of  two  parts,  x^  and  x/..,  respectively. 
To  explore  this  structure,  a  natural  approach  is  by  data 
fusion.  Because  two  sets  of  features  characterize  the  acoustic 
signals  from  different  aspects,  combining  them  has  potential 
to  provide  more  information  regarding  the  desired  vehicle 
acoustic  signature. 


The  methods  of  extracting  the  harmonic  features  x/(  can  be 
found  in  [6],  [7],  Thus,  the  major  problems  remained  in  this 
fusion  approach  are: 

•  how  to  select  the  key  frequency  features  Xfc,  which  will 
be  discussed  in  Section  III;  and 

•  how  to  develop  a  suitable  fusion  scheme,  which  will  be 
discussed  in  Section  IV. 

III.  Extracting  new  features  for  harmonics-based 

VEHICLE  CLASSIFICATION 

According  to  our  discussions  in  Section  II,  the  feature 
vector  Xfc  is  intended  to  provide  different  information  to  the 
harmonics  feature  vector  X/(.  Thus,  a  practical  solution  to 
extract  Xfc  is  by  searching  the  residual  inharmonics  for  a 
group  of  key  frequency  components,  in  which  the  contained 
information  will  be  naturally  different  from  the  harmonics. 

To  find  the  key  frequency  components,  an  ideal  search 
metric  would  be  the  classification  accuracy  or  inversely  the 
Bayes  classification  error.  However,  feature  selection  by  di¬ 
rectly  minimizing  the  Bayes  error  is  difficult  to  be  analytically 
performed,  and  an  alternative  discriminatory  metric  has  to 
be  sought.  In  this  research,  we  applied  an  effective  feature 
selection  method  based  on  mutual  information,  which  has 
been  developed  in  our  previous  project.  In  this  section,  we 
recapitulate  several  key  equations  used  in  this  method;  its  full 
details  can  be  found  in  [9],  [10], 

In  information  theory,  the  mutual  information  is  a  quantity 
that  measures  the  mutual  dependence  of  the  two  variables,  and 
is  defined  as: 

!(X,Y)  =  [  [  p(x,y)  log  dxdy,  (1) 

JyJ  x  P(x)p(y) 

where  p(x,  y)  is  the  joint  probability  density  function  of 
continual  random  variables  X  and  Y,  and  p(x)  and  p(y)  are 
the  marginal  probability  density  functions  respectively. 

The  using  mutual  information  as  a  searching  metric  can 
be  justified  as  following  bound  relation  between  the  mutual 
information  and  the  classification  error.  According  to  the 
following  Fano’s  inequality, 

H(Y\X)  <  H(Pe)  +  Pe  log(Z  -  1),  (2) 

where  l  is  the  number  of  classes;  the  error  probability  Pe  = 
P{f(X)y^Y};  and  /  is  a  classification  function.  Given 
C\  =  H(Pe),  C2  =  log {l  —  1),  and  using  I(X,  Y)  =  H{Y)  — 
H(Y\X),  an  inequality  regarding  MI  and  classification  error 
probability  is  derived  as  follows: 

H(Y)-I(X,Y)-Ci 

c2  ~ (3) 

From  (3),  it  is  seen  that  the  classification  error  Pe  is  lower 
bounded  by  a  term  decided  by  I(X,Y )  and  H(Y).  Thus 
given  a  fixed  H(Y),  maximization  of  mutual  information 
can  optimize  the  lower  bound  of  classification  error,  and 
using  MI  for  feature  selection  is  actually  indirectly  guided 
by  classification  error  probability  (or  accuracy). 
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The  framework  of  the  Mi-based  feature  selection  can  be 
described  as  follows: 


J  (x°)  =  maxI(x,Y) .  (4) 

xCx' 

where  x'  is  the  original  feature  vectors  with  M  components 
or  variables,  and  Y  the  corresponding  output  class  label 
(e.g.,  the  vehicle  type).  To  effectively  implement  (4),  there 
are  two  obstacles  to  overcome:  1)  how  to  evaluate  a  multi¬ 
dimensional  mutual  information:  in  /  (x,  Y),  x  is  a  vector, 
and  2)  how  to  search  the  maximum:  the  number  of  ways  of 
choosing  N  from  the  M  features  is  (^),  which  means  that 
a  huge  number  of  MI  evaluations  might  be  needed.  Aiming 
at  these  problems,  a  gradient  ascent  optimization  strategy 
is  applied  to  maximize  MI,  which  was  developed  from  our 
previous  research  in  [10].  First,  a  multi-dimensional  MI  can 
be  decomposed  into  a  series  of  one-dimensional  Mis: 

/(x,y)  « 

i  i  j>i 

+  (5) 

i  j>i 

where  x  =  (Xi,  X2, . . . ,  Xm)  be  a  random  vector  repre¬ 
senting  the  selected  features  Xi,  i  =  1,2,...,  M,  and  V  the 
random  variable  corresponding  class  label.  Based  on  equation 
(5),  a  fast  approach  to  maximize  J(x,  Y)  can  be  implemented 
as  follows: 

The  first  variable  is  chosen  as: 


X°  =  max  I  (X,:,  Y) ,  (6) 

i 

where  X®  represents  the  result  of  maximization  at  step  1. 
Then,  the  second  variable  is  chosen  as: 

X2°  =  max  [l{Xi,Y)-l(XuX°) 

Xi^X  J 

+/(xi,x1°|  y)].  (7) 

The  remaining  variables  are  chosen  in  the  same  way: 


X"  =  max 


7(Xi,y)-^/(Xi,X°) 


Xi*X° 

+  E/(X*.X°|  Y) 


(8) 


where  X°,j  =  1,2,  •••  ,n  —  1  are  the  variables  already 
selected.  This  selection  is  repeated  until  the  pre-specified 
number,  N,  of  variables  is  reached. 

The  above  strategy  selects  features  sequentially,  and  so 
avoids  the  problem  of  ‘combinatorial  explosion’ .  At  each  step, 
the  next  feature  will  be  selected  so  as  to  maximize  J(x,Y ) 
incrementally.  This  is  a  similar  idea  to  the  gradient  ascent  or 
other  hill-climbing  algorithms. 

Although  we  have  shown  that  the  key  frequency  features 
selected  by  mutual  information  can  effectively  provide  use¬ 
ful  discriminatory  information,  it  is  not  recommended  to 
completely  replace  the  existing  features,  i.e.,  the  harmonics 


features.  This  is  because  the  new  features  are  extracted  purely 
on  the  discriminatory  analysis.  The  amount  of  information 
extracted  can  be  guaranteed,  but  the  stability  of  the  features  is 
unsure.  For  example,  the  velocity  change  of  vehicles  is  likely 
to  affect  the  selected  results.  So  the  key  frequency  features 
should  be  better  considered  as  a  supplemental  constituent  to 
the  major  features,  and  a  fusion  approach  should  be  applied 
to  utilize  both  of  them.  As  long  as  this  strategy  is  followed, 
the  final  performance  could  be  improved  if  the  key  frequency 
components  captured  the  new  information,  but  will  not  degrade 
significantly  even  if  they  failed. 

IV.  Fusing  acoustic  feature  sets 

A  natural  way  to  combine  the  multiple  feature  sets  for 
classification  is  by  information  fusion  [11].  Two  possible 
fusion  strategies  that  can  be  applied  for  this  task  are  feature 
level  fusion  and  decision  level  fusion,  which  are  discussed  as 
follows. 

A.  Feature  level  fusion 

The  feature-level  fusion  is  a  medium-level  fusion  strategy, 
where  some  features  extracted  from  raw  data  are  combined 
for  decision.  One  of  the  aims  of  this  research  is  to  testify  if 
the  fusion  of  two  sets  of  features  can  improve  classification 
accuracy.  A  fair  assessment  should  be  based  on  the  feature 
vectors  with  the  same  dimensionality.  Hence,  the  L  higher 
order  (or  other  less  important)  harmonics  are  replaced  by  the 
same  number  of  key  frequency  components  to  form  a  fused 
feature  vector.  In  this  feature  level  fusion,  since  features  from 
different  extraction  methods  are  augmented  directly,  a  proper 
normalization  should  be  applied  to  address  the  difference  in 
the  measurement  scale. 

According  to  our  previous  discussion,  the  fused  feature 
vector  tends  to  depict  the  acoustic  signature  more  fully:  the 
harmonics  characterize  the  major  noise  sources  and  outline  the 
global  spectrum;  the  key  frequency  components  provide  other 
localized  details  of  the  spectrum. 

The  implementation  of  this  feature  level  fusion  is  straight¬ 
forward.  However,  one  major  problem  associated  with  this 
fusion  scheme  is  that  the  same  classifier  has  to  be  applied 
to  the  fused  feature  set,  which  means  that  the  two  feature 
sets  will  be  classified  by  the  same  classification  algorithm. 
This  is  an  unwanted  consequence  for  this  application,  because 
according  to  Section  II  the  two  feature  sets  have  different 
utilities  and  may  have  their  individually  favored  classifiers. 
It  is  known  that  classification  performance  depends  greatly  on 
the  characteristics  of  the  data,  and  there  is  no  single  classifier 
that  works  best  on  all  given  data  sets.  Hence  to  achieve  a  better 
performance,  the  following  decision  level  fusion  is  further 
investigated. 

B.  Decision  level  fusion 

Decision  level  fusion  is  a  high  level  fusion,  where  separate 
intermediate  decisions  can  be  drawn  from  each  individual 
features  set  firstly  and  then  combined  to  reach  a  global 
decision. 


282 


In  pattern  classification,  choosing  a  suitable  classifier  for 
a  given  feature  set  is  usually  carried  out  by  empirical  tests. 
In  this  application,  followed  by  the  previous  research  [3], 
[5],  we  choose  the  multivariate  Gaussian  classifier  (MGC) 
for  the  harmonic  features.  Currently-popular  support  vector 
machines  (SVMs)  [12],  [13]  have  shown  competitive  perfor¬ 
mance  with  the  best  available  algorithms  in  many  classification 
areas,  so  were  chosen  as  the  classifiers  for  the  key  frequency 
component  features.  To  combine  the  classification  results 
based  on  the  two  sets  of  features  (i.e.,  the  outputs  from  the 
classifiers  MGC  and  SVM),  a  Bayesian-based  decision  fusion 
is  applied,  described  as  follows. 

1 )  A  modified  Bayesian  decision  fusion  method: 


In  the  traditional  Bayesian  framework,  several  approaches 
are  adopted  to  combine  probabilistic  information.  Let  xt, 
i  =  1, 2, . . . ,  N  be  N  information  sources,  and  y  the  decision 
result,  according  to  the  maximum  a  posteriori  (MAP)  criterion, 
two  usually-used  fusion  methods  are  listed  as  follows  [11]: 

N 

p(y |xi,x2, . .  .  ,Xjv)  OC  IIM«,  (9) 


p(y|xi,x2,...,xjv)  ocp(y)  J|p(x,;|2/) .  (10) 

i= 1 

It  is  known  that  both  of  the  methods  are  based  on  certain 
independence  assumptions.  However  in  our  application,  the 
two  feature  sets  are  extracted  from  the  frequency  response  of 
the  same  acoustic  signal,  and  the  independent  assumption  is 
very  unlikely  to  hold.  So  directly  applying  the  above  fusion 
rules  will  result  in  discrepancy  from  the  expected  MAP  result. 
To  obtain  a  more  accurate  fusion,  we  propose  the  following 
improved  fusion  criterion. 

First,  based  on  our  application  scenario,  two  information 
sources,  i.e.,  xi  and  x2,  are  considered.  According  to  the 
Bayes  rule,  the  posterior  probability  can  be  written  as  follows: 


P(y  |xi,x2) 


p(xi)p  (ylxi)p(x2|t/,xi) 
P  (X1 )  P  (x2  |X1 ) 


(ID 


For  the  decision  purpose,  equation  (11)  can  be  simplified 
as: 

argmaxp(y|xi,x2)  oc  p(y|xi)p(x2|y,xi)  .  (12) 

v 


It  can  be  found  that  (12)  will  be  reduced  to  (10)  if  xi  is 
independent  to  x2. 

To  implement  the  fusion  in  (12),  the  conditional  proba¬ 
bilities  p(y |xi)  and  p(x2|y,  xi)  are  needed.  Through  our 
previous  discussion,  the  posterior  p(y |xi)  can  be  effectively 
obtained  from  the  SVM’s  output  (see  Section  IV-B3  for 
details),  and  the  likelihood  p(x2|y)  can  be  conveniently 
obtained  from  the  outputs  of  the  MGC.  Then  a  major  problem 
is  to  estimate  p(x2|y,  xi)  based  on  all  available  information, 
which  can  be  formulated  as  follows: 


{p(x2|y)  ,xi,x2}  ^p(x2|y,xi) ,  (13) 


where  — >  represents  a  deduction  based  on  the  left  side 
information. 

So,  according  to  our  specific  application  a  more  accurate 
MAP  decision  rule  can  be  re-written  as: 

argmaxp(?/|x^,xfe)  oc  p  (y\xk) p  (xh\y,xk) 
y 

~  P(y\xk)pxk(xh\y) ,  (14) 

where  x/,  and  xk  represent  the  harmonics  features  and  the  key 
frequency  features  respectively;  pXk  (x/Jy)  =  p(xfl\y,xk)  is 
an  estimate  of  p(xk\y,xk)  given  knowledge  of  p(xh\y)  and 
Xfc.  To  get  pXk  (x/j| y),  i.e.,  to  implement  (13),  we  propose  an 
approach  based  on  a  simple  information-theoretical  criterion, 
presented  as  follow. 

2)  Modulating  multi-dimensional  Gaussian  distribution: 

Previous  research  [3],  [5]  has  shown  that  the  multi¬ 
dimensional  Gaussian  distribution  is  an  effective  estimation 
for  the  probability  density  of  the  harmonic  features.  So  let  x/> 
be  a  d-dimensional  harmonics  feature  vector,  the  likelihood 
function  will  be: 

P^y)  =  (a^isii/s exp  H(x,t "  -  ri)  > 

(15) 

where  /j  and  £  are  the  mean  vector  and  the  covariance  matrix 
respectively. 

Because  the  feature  vector  x/,  is  not  independent  of  xk, 
the  appearance  of  xk  will  reduce  the  uncertainty  of  x/,. 
In  information  theory,  the  uncertainty  is  often  measured 
by  entropy.  So  to  estimate  p(xk\y,xk)  from  p(xh\y),  one 
convenient  approach  (based  on  the  above  basic  information 
theorem)  is  to  reduce  the  entropy  of  p  (x/>| y)  by  including  the 
knowledge  of  xk.  It  is  known  that  the  entropy  of  (15)  is: 

In  ^/(2  7re)d|£|)  .  (16) 

To  reduce  (16),  we  may  modulate  the  covariance  matrix  £ 
by: 

£*  =  (1 -/?)£,  (17) 

where  £*  is  the  updated  covariance  matrix,  and  j3  is  a  mod¬ 
ulation  factor,  decided  by  xk.  Using  the  Gaussian  function,  (3 
can  be  defined  as  follows: 

(3  =  exp  {— 7||xft  —  xfc || }  ,  (18) 

where  coefficient  7  controls  the  depth  of  modulation,  and  its 
suitable  value  can  be  empirically  decided  by  cross-validations. 

Thus,  the  conditional  probability  p(xh\y,xk)  is  estimated 
by  modulating  the  covariance  matrix  £  of  p(xk\y),  i.e., 

P  ( xh\y ,  xfc)  ~  A f(p,  £*),  (19) 

where  £*  is  the  updated  covariance  matrix,  including  the 
correlation  between  xk  and  x/j  (  see  (18)). 
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3)  Calibrating  SVMs’  output  to  probability: 

After  obtaining  the  likelihood  from  MGC,  to  implement 
the  fusion  approach  in  (14)  we  still  need  another  posterior 
probability  from  the  SVM  classifier,  i.e.,  p(y |xfc).  However, 
the  Standard  SVMs  do  not  provide  posterior  probability. 
To  find  this  probability,  one  convenient  way  is  by  training 
an  additional  sigmoid  function  to  approximate  a  posterior 
probability  [14],  To  explain  the  method,  we  need  to  first 
introduce  several  necessary  SVM  formulas  [13], 

Let  x,  be  a  feature  (data)  vector,  y,j  £  (+1,-1)  be  the  class 
label,  a  =  ( ai,a2 ,  ■■■  ,«l),  be  the  Lagrange  multipliers,  L 
the  number  of  examples  and  b  a  threshold.  The  SVM  classifier 
can  be  represented  as: 


Table  I 

The  number  of  runs  and  the  total  sample  numbers  for  five 

TYPES  OF  VEHICLES:  TRACKED  VEHICLES  Vlt  AND  V2t;  WHEELED 
VEHICLES  V3W,  V4W  AND  V5W. 


Vehicle  Class 

Number  of  Runs 

Total  Number  of  Samples 

Vlt 

6 

1734 

V2t 

6 

4230 

V3W 

6 

5154 

V4W 

6 

2358 

V5W 

6 

2698 

avoiding  those  more  complicated  methods,  such  as  Bayesian 
inferences. 


L 

f  (x)  =  ^2  ViaiK  (xi>  x)  +  b’ 

i= 1 


where  /\  (x.  x')  =  <l>('x)7  4>(x')  is  an  appropriate  kernel  func¬ 
tion  which  has  a  corresponding  inner  product  expansion,  <1. 
The  commonly-used  functions  are  polynomials  and  Gaussian 
radial  basis  functions  (RBFs): 


K  (x,x')  =  (xTx'  +  l)d  , 

(20) 

A(x,x')  =  exp{  II  ^  II  j. 

(21) 

To  get  the  posterior  probability,  we  applied  a  mapping 
method  introduced  in  [14],  where  an  additional  sigmoid  func¬ 
tion  is  used  to  approximate  the  necessary  posterior  probability. 
In  detail,  the  posterior  probability  is  trained  by  a  sigmoid 
function:  ^ 

P(V\ x)  ~  ,  m.  (22) 


1  +  exp  (Af  (x)  +  B)  ’ 


where  parameters  A  and  B  are  found  by  minimizing  the 
following  cross-entropy  error  function: 


argmin 

A,B 


L 

log  (^lx*))  +  (!  -  *i)  log  (! 

1=1 


p(y\xi ))  , 
(23) 


with  ti  =  The  details  on  the  calculation  of  (23)  can  be 
found  in  [14], 

Based  on  the  above  discussions,  a  decision  fusion  approach 
can  be  implemented,  summarized  as  follows: 


•  A  SVM  is  used  to  draw  a  decision  based  on  the  key 
frequency  feature  vector  X;.; 

•  A  maximum  likelihood  classifier,  i.e.  MGC,  is  applied  to 
the  harmonic  features  vector  X/,;  and 

•  An  improved  fusion  rule,  proposed  in  (14),  is  then  used 
to  achieve  the  final  global  decision. 

Comparing  with  other  Bayesian  fusion  rules,  e.g.,  (9)  and 
(10),  the  proposed  method  does  not  need  the  independence 
assumption,  and  is  based  on  a  more  accurate  MAP  criterion 
(see  (14)).  Meanwhile,  benefiting  from  the  specific  characters 
of  the  application  data  (e.g,  the  Multivariate  Gaussian  distribu¬ 
tion  for  harmonic  features),  its  implementation  is  simplified. 


V.  Experimental  results 

To  assess  the  proposed  information  fusion  approach,  ex¬ 
periments  are  carried  out  based  on  a  multi-category  vehicles 
acoustic  data  set  from  US  ARL  [3],  The  ARL  data  set  consists 
of  recoded  acoustic  signals  from  five  types  of  ground  vehicles, 
named  as  Vlt,  V2t,  V3W,  V4W,  and  V5W  (the  subscript  ‘t’ 
or  ‘w’  stands  for  the  tracked  or  wheeled  vehicles,  respectively). 
These  vehicles  cover  6  cycles  separately  around  a  prearranged 
track,  and  the  corresponding  acoustic  signals  are  recorded  for 
the  assessment. 

To  obtain  frequency  domain  representation,  the  Fourier 
transform  (FFT)  is  first  applied  to  each  second  of  the  acoustic 
signal  with  a  Hamming  window,  and  the  output  of  the  spectral 
data  (i.e.,  a  351  dimensional  frequency  domain  vector  x)  is 
considered  as  one  of  the  samples  for  these  five  vehicles.  Then 
feature  extraction  is  carried  out  on  the  sample  x  to  get  the  two 
sets  of  features,  i.e.,  the  harmonics  feature  vector  x/,  and  the 
key  frequency  feature  vector  x;:.  Subsequently,  these  feature 
vectors  are  fed  into  the  classifier(s),  and  the  final  classification 
result  will  be  obtained  from  the  fusion  algorithms. 

The  type  label  and  the  total  number  of  spectral  vectors  for 
each  vehicle  are  summarized  in  Table  I.  A  ‘run’  corresponds  to 
a  vehicle  moving  a  360°  circle  around  the  track  and  the  sensors 
array,  and  a  sample  means  the  FFT  result  at  one  second  time 
interval.  Differences  in  the  total  numbers  of  samples  reflect 
the  vehicles’  different  moving  speeds. 

As  we  discussed  in  Section  IV,  the  features  to  be  fused 
came  from  the  harmonic  extraction  and  mutual  information 
evaluation  respectively.  The  left  column  of  Figure  1  illustrates 
the  35 1  dimensional  spectral  vectors  for  the  five  types  of  vehi¬ 
cles  (corresponding  to  Vlt  -  V5W,  from  top  to  bottom).  For 
each  type  of  vehicle,  20  samples  are  illustrated  in  Figure  1,  re¬ 
flecting  the  variations  at  different  sampling  times  and  different 
runs.  The  right  column  of  Figure  1  shows  the  21  dimensional 
harmonic  features  extracted  from  the  above  spectral  vectors  for 
these  five  vehicles.  The  amplitudes  of  these  harmonics  form  a 
harmonic  feature  vector  x/j  =  ■  ■  ■  ,  x^1}. 

From  Figure  1,  it  can  be  seen  that  the  spectral  responses 
of  vehicle’s  sounds  are  quiet  complex,  consisting  of  many 
formants  that  did  not  appear  at  the  exact  positions  of  the 
integral  multipliers  of  the  fundamental  frequency.  There  are 
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Figure  1.  Illustration  of  spectrum  (left  column)  and  harmonic  features  (right  column)  for  five  vehicles  Vlt  (top)  -  V5W  (bottom),  respectively;  20  samples 
(depicted  in  different  colors)  for  each  class. 


also  severe  within-class  variations  in  the  acoustic  features 
(see  the  extracted  harmonics  features).  As  for  the  between- 
class  variations,  there  are  many  overlapped  formants  among 
5  vehicles.  For  examples.  Figure  1(a)  and  (g)  have  similar 
peaks  around  frequency  50  Hz  and  100  Hz;  Figure  1(g)  and 
(i)  show  similar  frequency  response  between  1-50  Hz.  This 
evidence  shows  that  vehicle  noises  are  much  more  complex, 
and  a  single  feature  set  is  difficult  to  cover  all  of  the  acoustic 
characters. 

In  the  experiments  for  accuracy  comparison,  half  of  runs  for 
each  vehicle  (i.e.,  3  runs)  were  randomly  chosen  to  estimate 
the  statistical  parameters  for  feature  extraction,  such  as  the 
harmonic  features’  means  vector  fjb,  covariance  matrix  S  and 
mutual  information  I.  The  remaining  half  runs  form  the  test  set 
on  which  performance  was  assessed.  Next,  feature  extraction 
carried  out  based  on  the  methods  introduced  in  Section  IV. 
Following  the  results  in  [3],  the  harmonic  number  is  chosen 
as  21.  The  main  reason  to  choose  the  harmonics  number  of  21 
is  to  keep  consistent  with  the  previous  studies  [3].  However, 
we  note  that  there  may  be  a  minimum  sufficient  number  for 
harmonics  but  that  will  depend  on  different  applications. 


As  we  discussed  previously,  SVMs  [12],  [13],  [15],  [16] 
and  a  Multivariate  Gaussian  classifier  (MGC)  [5]  were  chosen 
as  the  classifiers  in  these  experiments.  Because  SVMs  are 
inherently  binary  (two-class)  classifiers,  (])  one-against-one 
classifiers  were  used  with  subsequent  majority  voting  to 
give  a  multi-class  result.  The  kernel  function  used  is  an 
inhomogeneous  polynomial.  The  penalty  parameter  C  is  tested 
between  10-3  and  105,  and  polynomial  order  is  tested  from 
1-10  by  a  two  fold  validation  procedure  using  only  training 
data.  The  polynomial  order  3  and  C  =  20  were  finally  found 
as  the  best  values  for  this  SVM,  and  applied  to  the  following 
testing  stage.  The  training  data  are  also  used  to  estimate  the 
mean  vector  and  covariance  matrix  for  MGC. 

To  avoid  bias  on  random  samplings,  the  testing  was  repeated 
10  times  to  allow  an  estimate  of  the  error  inherent  in  this 
sampling  process.  The  10  times  classification  results  based  on 
different  feature  sets  are  shown  in  Figure  2. 

From  Figure  2,  the  following  results  are  observed: 

•  For  each  individual  feature  sets,  the  key  frequency  feature 
set  (the  second  column)  achieved  better  classification 
accuracy  than  the  harmonics  feature  set  (the  first  column). 
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Figure  2.  Comparison  of  classification  accuracy  for  different  feature  sets 
and  fusion  methods;  10  times  tests  with  random  chosen  3  runs  for  training 
and  the  remaining  3  runs  for  testing;  the  accuracy  is  the  overall  result  for  all 
5  type  of  vehicles. 

Table  II 

Mean  classification  accuracy  of  10  time  tests. 


Methods 

Average  accuracy  (%) 

Harmonics  feature  set 

73.44 

Key  frequency  feature  set 

77.05 

Feature-level  fusion 

77.34 

Decision-level  fusion 

83.86 

Modified  Bayesian  decision  fusion 

84.24 

This  supported  our  initial  proposal  of  using  mutual 
information  for  feature  extraction  as  introduced  in  Section 

Hi; 

•  The  feature  level  fusion  (the  third  column)  is  slightly 
better  than  using  each  individual  feature  sets  (the  first 
and  the  second  column)  but  is  very  close  to  the  best  result 
from  each  individual  source.  This  phenomenon  has  been 
observed  in  previous  sensor  fusion  research  literature; 

•  The  decision  level  fusion  (the  fourth  and  fifth  column) 
achieved  significant  improvements  than  those  using  each 
individual  feature  sets  (the  first  and  second  column), 
and  are  also  much  better  than  the  feature  level  fusion 
(the  third  column);  This  demonstrated  the  efficacy  of  the 
proposed  information  approach;  and 

•  The  improvement  of  the  modified  fusion  method  (pro¬ 
posed  in  Section  IV-B)  is  found  consistently  in  all  of  10 
times  tests  (see  the  fifth  column). 

The  average  numbers  for  the  above  10  times  tests’  results 
are  summarized  in  Table  II,  which  further  demonstrated  the 
effectiveness  of  the  proposed  fusion  methods. 

VI.  Conclusions 

In  this  paper,  we  developed  an  information  fusion  approach 
for  acoustic  ground  vehicle  classification.  First,  we  argued 
that  multiple  feature  sets  are  needed  to  improve  the  vehi¬ 
cles’  classification  accuracy.  Then,  a  key  frequency  feature 
vector  is  added  to  the  harmonic  feature  vector,  to  amend 
the  ignored  discriminatory  information.  Finally,  a  modified 
Bayesian  decision  fusion  was  proposed  to  better  combine  the 
two  sets  of  features.  Experiments  were  carried  out  to  assess  the 
classification  accuracies  of  the  fusion  approaches,  based  on  a 
multi-category  vehicles  acoustic  data  set.  The  results  showed 


that  significant  improvement  of  classification  accuracy  has 
been  achieved  by  the  decision  level  fusion  approach.  Future 
research  will  address  the  features’  stability  with  regard  to 
vehicles’  velocity  changes,  and  extended  this  approach  to  other 
more  complicated  data  sets. 

Acknowledgments 

The  research  was  sponsored  by  the  U.S.  Army  Research 
Laboratory  and  the  U.K.  Ministry  of  Defence  and  was  ac¬ 
complished  under  Agreement  Number  W911NF-06-3-0001. 
The  views  and  conclusions  contained  in  this  document  are 
those  of  the  author(s)  and  should  not  be  interpreted  as 
representing  the  official  policies,  either  expressed  or  implied, 
of  the  U.S.  Army  Research  Laboratory,  the  U.S.  Government, 
the  U.K.  Ministry  of  Defence  or  the  U.K.  Government.  The 
U.S.  and  U.K.  Governments  are  authorized  to  reproduce  and 
distribute  reprints  for  Government  purposes  notwithstanding 
any  copyright  notation  hereon 

References 

[1]  H.  Wu,  M.  Siegel,  and  P.  Khosla.  Vehicle  sound  signature  recognition 
by  frequency  vector  principal  component  analysis.  IEEE  Transactions 
on  Instrument  and  Measurement,  48(5):  1005-1009,  1999. 

[2]  M.  Duarte  and  Y.H.  Hu.  Vehicle  classification  in  distributed  sensor 
networks.  Journal  of  Parallel  and  Distributed  Computing,  64:826-838, 
2004. 

[31  T.  Raju  Damarla  and  Gene  Whipps.  Multiple  target  tracking  and 
classification  improvement  using  data  fusion  at  node  level  using  acoustic 
signals.  Technical  Report,  ARL. 

[4]  H.  Wu  and  J.M.  Mendel.  Classification  of  battlefield  ground  vehicles 
using  acoustic  features  and  fuzzy  logic  rule-based  classifiers.  IEEE 
Transactions  on  Fuzzy  Systems,  15(1):56— 72,  2007. 

[5]  T.  R.  Damarla,  T.  Pham,  and  D.  Lake.  An  algorithm  for  classifying 
multiple  targets  using  acoustic  signature.  In  Proceedings  of  SP IE  Signal 
Processing,  Sensor  Fusion  and  Target  Recognition,  2004. 

[6]  D.  Lake.  Harmonic  phase  coupling  for  battlefield  acoustic  target 
identification.  In  Proceedings  of  IEEE  International  Conference  on 
Acoustics,  Speech,  and  Signal  Processing,  1998. 

[7]  D.  Lake.  Tracking  fundamental  frequency  for  synchronous  mechanical 
diagnostic  signal  processing.  In  Proceedings  of  9th  IEEE  Signal 
Processing  Workshop  on  Statistical  Signal  and  Array  Processing,  1998. 

[8]  R.  Battiti.  Using  mutual  information  for  selecting  features  in  supervised 
neural  net  learning.  IEEE  Transactions  on  Neural  Networks,  5(4):537- 
550,  July  1994. 

[9]  B.  Guo,  S.R.  Gunn,  R.I.  Damper,  and  J.D.B.  Nelson.  Band  selection 
for  hyperspectral  image  classification  using  mutual  information.  IEEE 
Geoscience  and  Remote  Sensing  Letters,  4(3): 522-526,  2007. 

[10]  B.  Guo,  R.  I.  Damper,  S.  R.  Gunn,  and  J.  D.  B.  Nelson.  A 
fast  separability-based  feature  selection  method  for  high-dimensional 
remotely- sensed  image  classification.  Pattern  Recognition,  41(5):  1670- 
1679,  2008. 

[11]  J.  Manyika  and  H.  Dun'ant- Whyte.  Data  Fusion  and  Sensor  Manage¬ 
ment:  A  Decentralized  Information-Theoretic  Approach.  Ellis  Horwood, 
New- York,  London,  1994. 

[12]  Bernhard  E.  Boser,  Isabelle  M.  Guyon,  and  Vladimir  N.  Vapnik.  A 
training  algorithm  for  optimal  margin  classifiers.  In  Proceedings  of  the 
fifth  annual  workshop  on  Computational  learning  theory,  pages  144- 
152,  Pittsburgh,  Pennsylvania,  United  States,  1992. 

[13]  C.  Cortes  and  V.  N.  Vapnik.  Support- vector  networks.  Machine 

Learning,  20(3):  1-25,  1995. 

[14]  J.  Platt.  Advances  in  large  margin  classifiers,  chapter  Probabilistic 
outputs  for  support  vector  machines  and  comparison  to  regularized 
likelihood  methods.  Cambridge:  MIT  Press,  2000. 

[15]  C.  Burges.  A  tutorial  on  support  vector  machines  for  pattern  recognition. 
Knowledge  Discovery  and  Data  Mining,  2(2):  121-167,  1998. 

[16]  V.  Vapnik.  An  overview  of  statistical  learning  theory.  IEEE  Transactions 
on  Neural  Networks,  10(5):988-999,  September  1999. 


286 


